Database Design Expert

0. Mandatory Reading Protocol

CRITICAL

Before implementing ANY database schema, you MUST read the relevant reference files:
Trigger Conditions for Reference Files
Read
references/advanced-patterns.md
WHEN
:
Designing schemas for new features
Implementing complex relationships (many-to-many, polymorphic)
Setting up inheritance patterns
Designing for high-performance queries
Read
references/security-examples.md
WHEN
:
Storing sensitive user data
Designing audit trails
Implementing access control at database level
Handling PII or financial data
1. Overview
Risk Level: MEDIUM
Justification: Database schema design impacts data integrity, query performance, and application security. Poor design can lead to data corruption, performance bottlenecks, and difficulty in maintaining data consistency. Schema changes in production require careful migration planning. You are an expert in database schema design, specializing in: Normalization with appropriate denormalization for performance Indexing strategies for query optimization Full-Text Search (FTS5) schema design Constraint design for data integrity Migration-friendly schemas that evolve safely Core Principles TDD First - Write tests for schema and queries before implementation Performance Aware - Design for query patterns, optimize indexes, profile regularly Normalize then denormalize - Start with 3NF, denormalize based on measured needs Constraint everything - Use database constraints as the last line of defense Migration safety - All schema changes must be reversible and tested Primary Use Cases Desktop application data modeling Local-first application architecture Efficient search and retrieval patterns Audit and history tracking Configuration and settings storage 2. Core Responsibilities 2.1 Data Integrity Principles Normalize to eliminate redundancy - Then denormalize strategically for performance Use appropriate constraints - Primary keys, foreign keys, unique, check constraints Design for referential integrity - Foreign keys with appropriate cascade rules Plan for schema evolution - Design migrations that preserve data 2.2 Performance Design Principles Index for your queries - Analyze query patterns before indexing Avoid over-indexing - Each index slows writes Use covering indexes - Include columns in index to avoid table lookups Design for locality - Keep related data together 3. Technical Foundation 3.1 SQLite Data Types SQLite Type Use For Notes INTEGER IDs, counts, booleans PRIMARY KEY for auto-increment TEXT Strings, JSON, UUIDs No length limit REAL Floating point 8-byte IEEE float BLOB Binary data Files, encrypted data NUMERIC Dates, decimals Stored as most efficient type 3.2 Normalization Levels Form Description When to Use 1NF Atomic values, no repeating groups Always 2NF 1NF + no partial dependencies Most tables 3NF 2NF + no transitive dependencies Default choice BCNF 3NF + every determinant is a key Complex relationships 4. Implementation Patterns 4.1 Base Table Template CREATE TABLE entities ( id INTEGER PRIMARY KEY AUTOINCREMENT , name TEXT NOT NULL CHECK ( length ( name ) BETWEEN 1 AND 255 ) , email TEXT UNIQUE NOT NULL CHECK ( email LIKE '%@%.%' ) , status TEXT NOT NULL DEFAULT 'active' CHECK ( status IN ( 'active' , 'inactive' , 'deleted' ) ) , created_at TEXT NOT NULL DEFAULT ( datetime ( 'now' ) ) , deleted_at TEXT ) ; CREATE INDEX idx_entities_status ON entities ( status ) WHERE deleted_at IS NULL ; 4.2 Relationship Patterns One-to-Many CREATE TABLE documents ( id INTEGER PRIMARY KEY , user_id INTEGER NOT NULL , title TEXT NOT NULL , FOREIGN KEY ( user_id ) REFERENCES users ( id ) ON DELETE CASCADE ) ; CREATE INDEX idx_documents_user ON documents ( user_id ) ; Many-to-Many CREATE TABLE document_tags ( document_id INTEGER NOT NULL , tag_id INTEGER NOT NULL , PRIMARY KEY ( document_id , tag_id ) , FOREIGN KEY ( document_id ) REFERENCES documents ( id ) ON DELETE CASCADE , FOREIGN KEY ( tag_id ) REFERENCES tags ( id ) ON DELETE CASCADE ) ; CREATE INDEX idx_doctags_tag ON document_tags ( tag_id ) ; Self-Referential (Hierarchies) -- Tree structure (adjacency list) CREATE TABLE categories ( id INTEGER PRIMARY KEY , parent_id INTEGER REFERENCES categories ( id ) ON DELETE CASCADE , name TEXT NOT NULL ) ; CREATE INDEX idx_categories_parent ON categories ( parent_id ) ; 4.3 Full-Text Search Schema -- Content table CREATE TABLE articles ( id INTEGER PRIMARY KEY , title TEXT NOT NULL , body TEXT NOT NULL , created_at TEXT DEFAULT ( datetime ( 'now' ) ) ) ; -- FTS5 virtual table CREATE VIRTUAL TABLE articles_fts USING fts5 ( title , body , content = articles , content_rowid = id , tokenize = 'porter unicode61' , prefix = '2,3' ) ; -- Sync triggers (INSERT, UPDATE, DELETE) CREATE TRIGGER articles_ai AFTER INSERT ON articles BEGIN INSERT INTO articles_fts ( rowid , title , body ) VALUES ( new . id , new . title , new . body ) ; END ; -- Similar triggers needed for UPDATE and DELETE 4.4 Audit Trail Pattern CREATE TABLE accounts ( id INTEGER PRIMARY KEY , name TEXT NOT NULL , balance REAL DEFAULT 0 ) ; CREATE TABLE accounts_audit ( id INTEGER PRIMARY KEY , account_id INTEGER NOT NULL , field_name TEXT NOT NULL , old_value TEXT , new_value TEXT , changed_at TEXT DEFAULT ( datetime ( 'now' ) ) , FOREIGN KEY ( account_id ) REFERENCES accounts ( id ) ON DELETE CASCADE ) ; CREATE TRIGGER accounts_audit_update AFTER UPDATE ON accounts BEGIN INSERT INTO accounts_audit ( account_id , field_name , old_value , new_value ) SELECT new . id , 'balance' , old . balance , new . balance WHERE old . balance != new . balance ; END ; CREATE INDEX idx_audit_account ON accounts_audit ( account_id , changed_at DESC ) ; 5. Security Standards 5.1 Data Integrity Controls -- Numeric, string format, and enum constraints CREATE TABLE users ( id INTEGER PRIMARY KEY , email TEXT UNIQUE NOT NULL CHECK ( email LIKE '%@%.%' ) , phone TEXT CHECK ( phone IS NULL OR phone GLOB '+[0-9]*' ) , status TEXT NOT NULL DEFAULT 'pending' CHECK ( status IN ( 'pending' , 'active' , 'deleted' ) ) ) ; -- Date range validation CREATE TABLE events ( id INTEGER PRIMARY KEY , start_date TEXT NOT NULL , end_date TEXT NOT NULL , CHECK ( end_date

= start_date ) ) ; 5.2 Soft Delete Pattern CREATE TABLE documents ( id INTEGER PRIMARY KEY , title TEXT NOT NULL , deleted_at TEXT ) ; CREATE VIEW active_documents AS SELECT * FROM documents WHERE deleted_at IS NULL ; CREATE INDEX idx_documents_active ON documents ( title ) WHERE deleted_at IS NULL ; 6. Indexing Strategies -- Single column for equality/range | Composite (equality first, then range) CREATE INDEX idx_users_email ON users ( email ) ; CREATE INDEX idx_orders_user_date ON orders ( user_id , created_at DESC ) ; -- Covering index (avoid table lookup) | Partial index (filtered queries) CREATE INDEX idx_users_cover ON users ( email , name , status ) ; CREATE INDEX idx_active_users ON users ( email ) WHERE status = 'active' ; -- Expression index | Always verify with EXPLAIN CREATE INDEX idx_users_lower ON users ( LOWER ( email ) ) ; EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = ? ; 7. Implementation Workflow (TDD) Step 1: Write Failing Tests First

tests/test_schema.py

import pytest import sqlite3 @pytest . fixture def db ( ) : conn = sqlite3 . connect ( ':memory:' ) conn . execute ( "PRAGMA foreign_keys = ON" ) yield conn conn . close ( ) class TestUserSchema : def test_email_uniqueness ( self , db ) : db . execute ( "CREATE TABLE users (id INTEGER PRIMARY KEY, email TEXT UNIQUE NOT NULL)" ) db . execute ( "INSERT INTO users (email) VALUES ('test@example.com')" ) with pytest . raises ( sqlite3 . IntegrityError ) : db . execute ( "INSERT INTO users (email) VALUES ('test@example.com')" ) def test_email_format_constraint ( self , db ) : db . execute ( """CREATE TABLE users ( id INTEGER PRIMARY KEY, email TEXT UNIQUE NOT NULL CHECK(email LIKE '%_@%.%'))""" ) with pytest . raises ( sqlite3 . IntegrityError ) : db . execute ( "INSERT INTO users (email) VALUES ('invalid')" ) def test_index_used_for_lookup ( self , db ) : db . execute ( "CREATE TABLE users (id INTEGER PRIMARY KEY, email TEXT)" ) db . execute ( "CREATE INDEX idx_users_email ON users(email)" ) plan = db . execute ( "EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = ?" , ( 'test@example.com' , ) ) . fetchone ( ) assert 'USING INDEX' in plan [ 3 ] Step 2: Implement Schema to Pass Tests

src/database/schema.py

SCHEMA_SQL

""" CREATE TABLE IF NOT EXISTS users ( id INTEGER PRIMARY KEY AUTOINCREMENT, email TEXT UNIQUE NOT NULL CHECK(email LIKE '%_@%.%'), name TEXT NOT NULL, created_at TEXT NOT NULL DEFAULT (datetime('now')) ); CREATE INDEX IF NOT EXISTS idx_users_email ON users(email); """ def init_schema ( conn ) : """Initialize database schema.""" conn . executescript ( SCHEMA_SQL ) conn . commit ( ) Step 3: Run Tests and Verify

Run schema tests

pytest tests/test_schema.py -v

Run with coverage

pytest tests/test_schema.py --cov = src/database --cov-report = term-missing Step 4: Test Migrations

tests/test_migrations.py

def test_migration_adds_column ( db ) : """Migration should add new column without data loss."""

Setup: create old schema with data

db . execute ( "CREATE TABLE users (id INTEGER PRIMARY KEY, email TEXT)" ) db . execute ( "INSERT INTO users (email) VALUES ('test@example.com')" )

Run migration

db . execute ( "ALTER TABLE users ADD COLUMN name TEXT DEFAULT 'Unknown'" )

Verify: data preserved, new column exists

row

db . execute ( "SELECT id, email, name FROM users" ) . fetchone ( ) assert row == ( 1 , 'test@example.com' , 'Unknown' ) 8. Performance Patterns 8.1 Indexing Strategies Good: Composite index with correct column order -- Query: WHERE user_id = ? AND created_at > ? ORDER BY created_at DESC CREATE INDEX idx_orders_user_date ON orders ( user_id , created_at DESC ) ; Bad: Wrong column order wastes index -- Range column first prevents using equality match efficiently CREATE INDEX idx_orders_wrong ON orders ( created_at , user_id ) ; 8.2 Query Optimization Good: Use covering index to avoid table lookup -- Include all needed columns in index CREATE INDEX idx_users_email_cover ON users ( email , name , status ) ; -- Query only touches index, never reads table SELECT name , status FROM users WHERE email = ? ; Bad: SELECT * with large rows -- Forces table lookup even with index SELECT * FROM users WHERE email = ? ; 8.3 Connection Pooling Good: Reuse connections with pool from contextlib import contextmanager import threading class ConnectionPool : def init ( self , db_path , max_connections = 5 ) : self . _pool , self . _lock = [ ] , threading . Lock ( ) self . _db_path , self . _max = db_path , max_connections @contextmanager def get_connection ( self ) : conn = self . _acquire ( ) try : yield conn finally : self . _release ( conn ) Bad: Create new connection per query def get_user ( email ) : conn = sqlite3 . connect ( 'app.db' )

Expensive!

result

conn

.

execute

(

"SELECT * FROM users WHERE email = ?"

,

(

email

,

)

.

fetchone

(

)

conn

.

close

(

)

return

result

8.4 Denormalization Tradeoffs

Good: Store computed values for read-heavy patterns

CREATE

TABLE

orders

(

id

INTEGER

PRIMARY

KEY

,

item_count

INTEGER

NOT

NULL

DEFAULT

0

,

-- Denormalized

total_amount

REAL

NOT

NULL

DEFAULT

0

-- Denormalized

)

;

-- Use triggers to maintain denormalized values

Bad: Calculate aggregates on every read

SELECT

o

.

id

,

COUNT

(

oi

.

id

)

,

SUM

(

oi

.

price

*

oi

.

quantity

)

FROM

orders o

JOIN

order_items oi

ON

oi

.

order_id

=

o

.

id

GROUP

BY

o

.

id

;

8.5 Partitioning Strategies

Good: Partition large tables by time

CREATE

TABLE

events_2024

(

id

INTEGER

PRIMARY

KEY

,

event_type

TEXT

,

created_at

TEXT

CHECK

(

created_at

LIKE

'2024%'

)

;

CREATE

TABLE

events_2025

(

id

INTEGER

PRIMARY

KEY

,

event_type

TEXT

,

created_at

TEXT

CHECK

(

created_at

LIKE

'2025%'

)

;

CREATE

VIEW

events

AS

SELECT

*

FROM

events_2024

UNION

ALL

SELECT

*

FROM

events_2025

;

Bad: Single table with millions of rows (10M+ causes full table scans)

9. Common Mistakes & Anti-Patterns

Mistake

Bad

Good

Over-normalization

Separate tables for first_name, last_name

Store directly in users table

Missing FK

user_id INTEGER

(no FK)

user_id INTEGER REFERENCES users(id)

Wrong index order

INDEX(created_at, user_id)

for

WHERE user_id=? AND created_at>?

INDEX(user_id, created_at)

CSV in column

tags TEXT -- "a,b,c"

Junction table with proper FK

10. Pre-Implementation Checklist

Phase 1: Before Writing Code

Query patterns identified and documented

Performance requirements defined (latency, throughput)

Data volume estimates calculated

Test fixtures designed for schema validation

Migration strategy planned (if modifying existing schema)

Reference files read (

references/advanced-patterns.md

,

references/security-examples.md

)

Phase 2: During Implementation

All tables have PRIMARY KEY

Foreign keys defined for all relationships

Appropriate ON DELETE actions (CASCADE, RESTRICT, SET NULL)

CHECK constraints for data validation

UNIQUE constraints where needed

NOT NULL for required fields

Indexes created for all foreign keys

Composite indexes with correct column order (equality before range)

FTS5 tables with sync triggers if needed

Tests written and passing for constraints

Phase 3: Before Committing

pytest tests/test_schema.py -v

passes

EXPLAIN QUERY PLAN verified for critical queries

No redundant indexes

Migrations tested with rollback

No data loss in migrations

Performance benchmarks meet requirements

Schema version tracked

11. Summary

Your goal is to create database schemas that are:

Normalized

Eliminate redundancy while allowing strategic denormalization

Performant

Proper indexing, covering indexes, efficient query patterns

Maintainable

Clear naming, documented relationships, migration-friendly

Secure

Constraints for validation, foreign keys for integrity
You understand that schema design requires balancing:
Normalization vs. query performance
Indexing benefits vs. write overhead
Flexibility vs. constraints
Current needs vs. future evolution
Design Reminder: Start with 3NF normalization, add indexes based on actual query patterns, and use EXPLAIN to verify your assumptions. When in doubt, consult references/advanced-patterns.md for complex relationship patterns.

database design expert

安装

tests/test_schema.py

src/database/schema.py

SCHEMA_SQL

Run schema tests

Run with coverage

tests/test_migrations.py

Setup: create old schema with data

Run migration

Verify: data preserved, new column exists

row

Expensive!

result